Mon. Not. R. Astron. Soc. 000, 1-7 (2006) Printed 5 February 2008 (MN KTpX style file v2.2) 



Efficient Bayesian inference for multimodal problems in cosmology 



J.R. Shaw, 1 * M. Bridges, 2 M.P. Hobson 2 

1 Institute of Astronomy, Madingley Road, Cambridge, CB3 0HA, UK 

2 Astrophysics Group, Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE, UK 



o 
o 
o 

W) 
< 



(N 
> 

o 

00 

o 
o 
o 

6 



Accepted 2007 April 16. Received 2007 April 5; in original form 5 February 2008 



ABSTRACT 

Bayesian model selection provides the cosmologist with an exacting tool to distinguish be- 
tween competing models based purely on the data, via the Bayesian evidence. Previous meth- 
ods to calculate this quantity either lacked general applicability or were computationally de- 
manding. However, nested sampling (Skilling 2004), which was recently applied successfully 
to cosmology by Muhkerjee et al. 2006, overcomes both of these impediments. Their im- 
plementation restricts the parameter space sampled, and thus improves the efficiency, using 
a shrinking ellipsoidal bound in the n-dimensional parameter space encompassing param- 
eter samples above a decreasing likelihood value. However, if the likelihood function con- 
tains any multi-modality, separated over a significant portion of the parameter space then 
the ellipse is prevented from constraining the sampling region by less than the distance be- 
tween the likelihood peaks. In this paper we introduce a method of clustered nested sampling 
whereby ellipsoidal clusters can form on any peaks identified -thus improving the efficiency 
by a factor which is equal to the ratio of the volumes enclosed by the set of small clustered 
ellipsoids and the large single ellipse that would necessarily be required without clustering. 
In addition we have implemented a method for determining the expectation and variance of 
the final evidence value without the need to use sampling error from repetitions of the al- 
gorithm ; this further reduces the computational load by at least an order of magnitude. We 
have applied our algorithm to a pair of toy models and one cosmological example where we 
demonstrate that the number of likelihood evaluations required is <~ 4% of that necessary 
for using previous algorithms. We have produced a FORTRAN library containing our routines 
which can be called from any sampling code, in addition for convenience we have incorpo- 
rated it into the popular CosmoMC code as CosmoClust. Both are available for download 
at www . mrao . cam .ac.uk/ software/ cosmoclust. 
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1 INTRODUCTION 

Bayesian inference has become an invaluable tool in analysing 
cosmological datasets to place constraints on model parameters. 
However the more fundamental question of model selection, nat- 
urally incorporated in the Bayesian framework by the evidence, 
has been under-utilised in cosmology. Impeded until recently by 
the high computational cost of methods such as thermodynamic in- 
tegration, progress has now been made with the advent of nested 
sampling (Skilling 2004), targeted solely at efficient calculation of 
the evidence but also capable of providing posterior inferences. 
This method has recently been applied to cosmological problems 
by Mukherjee et al. (2006) allowing model selection on feasible 
timescales, typical of Markov Chain Monte Carlo (MCMC) pa- 
rameter estimation. Their algorithm uses an elliptical bound to re- 
strict the prior around the maximum likelihood peak in the posterior 
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and thus improves the acceptance ratio and efficiency. While this 
method can be used for the majority of cosmological applications, 
where posteriors are generally uni-modal, there are examples where 
this is not so. In this paper we describe a clustering nested sampler 
which is capable of detecting and isolating multiple peaks in the 
posterior, fitting separate ellipsoidal bounds around each and mak- 
ing considerable savings in computational load when compared to 
a single ellipse. In addition we have implemented an improved er- 
ror calculation (Skilling 2004) on the final evidence result which 
produces a mean and standard error in one calculation, eliminating 
the need for multiple runs. 



2 BAYESIAN INFERENCE 

A Bayesian analysis provides a coherent approach to the estimation 
of a set of model parameters and, crucially in determining which 
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model, M, best describes the data, D. Bayes' theorem states that 



P(0|D,M) = 



/':!) (-). M )/'((-) M ■ 

P(D|M) : 



(1) 



where P(0|D,M) is the posterior probability distribution, 
P(D|0,M) ee £ the likelihood, P(&\M) = ty the prior, 
P(D\M) = Z the Bayesian evidence. 

Typically, the posterior is generated via a Metropolis-Hastings 
algorithm with MCMC sampling, where at equilibrium the chain 
contains a set of samples from parameter space distributed with 
the posterior probability distribution. In the Bayesian framework 
all of the inference is contained in the final multi-dimensional pos- 
terior, which can be marginalised over each parameter to obtain 
constraints. The Bayesian evidence is represented by the overall 
normalisation of this posterior. The evidence is large in models 
which have a high likelihood over a large proportion of the prior 
parameter space so that we can consider the evidence to be the av- 
erage of the likelihood divided by the prior. As such it forms the 
integral, over the n-dimensional parameter space 



Z = j C(&)n(&)d n &. 



(2) 



Thus a model containing a large number of parameters for which 
only a narrow region of the prior is likely will have a low evidence 
and vice versa, providing a natural mechanism to limit the complex- 
ity of cosmological models and elegantly incorporating Ockham's 
razor. 

A standard scenario in Bayesian model selection would re- 
quire the computation of evidences for two models A and B. The 
difference of log-evidences In Za — In Zb, also called the Bayes 
factor then quantifies how well A may fit the data when compared 
with model B. Jeffreys (1961) provides a scale on which we can 
make qualitative conclusions based on this difference: AlnZ < 1 
is not significant, 1 < AlnZ < 2.5 significant, 2.5 < AlnZ < 5 
strong and AlnZ > 5 decisive. 

The prior dependence of the evidence requires that the en- 
tire parameter space is adequately sampled, the MCMC method 
described above moves rapidly through parameter space toward ar- 
eas of high likelihood leaving the majority of the surrounding prior 
heavily under sampled. This is sufficient to generate parameter con- 
straints where regions close to the peak of the posterior are most 
important, but it radically over estimates the evidence due to un- 
der sampling of regions of low likelihood. In order to sample more 
uniformly the method of thermodynamic integration has previously 
been implemented (see e.g Hobson et al. 2003; Slosar et al. 2003) 
and used successfully by a number of authors (Niarchou et al. 2004, 
Basset et al. 2004, Mukherjee et al. 2006, Trotta 2005, Beltran et al. 
2005, Bridges et al. 2006) in computing the evidence. Thermody- 
namic integration uses MCMC to draw samples not from the poste- 
rior directly but from C x tv where A is the inverse temperature and 
is raised from ~ to 1. For low values of A peaks in the posterior 
are sufficiently flattened to allow improved mobility of the chains 
over the entire prior range. Typically it is possible to obtain accu- 
racies of within 0.5 units in log-evidence via this method however 
this does require of order 10 6 samples per chain (with around 10 
chains required to determine a sampling error) making it at least 
an order of magnitude more costly than the sampling needed for 
parameter estimation. 
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Figure 1. Cartoon illustrating the posterior of a two dimensional problem 
(a) and the transformed C(X) function (b) where the prior masses Xi are 
associated with each likelihood d . 



3 NESTED SAMPLING 

Nested sampling is computationally more efficient as it transforms 
the integral in Eqn. 2 to a single dimension by a suitable re- 
parameterisation in terms of the prior mass X. This mass can be 
divided into elements dX — n(&)d N & which can be combined 
in any order to give say 



X(X)= [ n(G)r&. (3) 

J £(&)>>, 

the prior mass covering all likelihoods above the iso-likelihood 
curve C = A. We also require the function jC(X) to be a singular 
decreasing function (which is trivially satisfied for most posteriors) 
so that using sampled points we can estimate the evidence via the 
integral: 



Z= f C(X)dX. 
Jo 



(4) 



An example of a posterior in two dimensions and its associated 
function C(X) is shown in Fig. 1. 

The transformed integral is dominated by a relatively small 
region of the prior containing the majority of the posterior mass. 
For efficient sampling the division of the prior elements should not 
be linear but geometric: 



Xo = 1,X\ — tiXo, 



Xi — tiXi—i 



x„ 



(5) 



where we would desire a largely constant ti\ the procedure for 
nested sampling ensures just that. 

The method is then to draw an initial set of ./V active samples 
uniformly from the full prior [0, Xo = 1]. The samples are ordered 
in terms of likelihood, the smallest of which, Co is removed from 
the active set. The prior is then reduced according to Eqn. 5 so 
that the region sampled becomes [0, Xi] where Xi = tiXo, a 
point is then found to replenish the active set subject to the criterion 
that its likelihood > Co. This sampling/replenishment cycle is then 
repeated until the entire prior has been traversed. The algorithm 
thus travels through nested shells of likelihood as the prior mass is 
reduced. At this point the evidence is approximated by a numerical 
integration routine such as the trapezoid rule: 



(Xi 



■Xi+i) 



(6) 



The method described above ensures that the U have the distribu- 
tion P(ti) — Ntf -1 , allowing them to be statistically assigned. 
This does introduce an uncertainty, but it is possible to quantify 
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this accurately and relate it to the final Z. If we could assign each 
Xi exactly then the only error would be due to the discretisation of 
the integral which, given sufficient points would be expected to be 
negligible. For this reason the dominant source of error in the final 
Z arises from the incorrect assignment of each prior mass. Fortu- 
itously our knowledge of the distribution P(ti) from which each ti 
is drawn allows us to assess the errors in any quantities we produce. 
Given the probability of a vector t as 



5.274 



p(t)=n^) 



we can write the expectation value of some quantity F(t) as 



(F) 



F(t)P(t)d M t. 



(7) 



(8) 



Evaluating this integral is possible by Monte Carlo methods by 
sampling a given number of vectors t and finding the average F. By 
this method we can essentially determine the variance of the curve 
in X — C space, and thus in the evidence integral J C(X)dX. 

Previous numerical evidence analyses have determined an as- 
sociated error on the final evidence estimate by repeating identical 
calculations to determine the standard error, and for reliable results 
this method requires of order tens of estimates. As our results will 
show the above method is capable of estimating the expectation and 
variance of Z accurately when compared to sampling statistics so 
as to eliminate the need for any repetition. 



3.1 Ellipsoidal Sampling 

Evaluating the likelihood of a point in cosmological parameter 
space is computationally expensive. Nested sampling reduces this 
overhead as it requires fewer likelihood evaluations. However fur- 
ther reductions are possible by restricting the prior region at suc- 
cessive sample/replacement cycles. Drawing blindly from the prior 
probability distribution, will invariably result in an increasing num- 
ber of samples from unlikely regions of the prior. Ellipsoidal sam- 
pling partially overcomes this by approximating the iso-likelihood 
of the point to be replaced by an n-dimensional ellipsoid formed 
from the covariance matrix of the current set of active points. In the 
limit as the ellipsoid approaches that of the likelihood bound, the 
acceptance rate tends to unity. 

Ellipsoidal sampling provides an elegant solution, but there 
are some algorithmic points to note. Firstly, it is imperative that the 
ellipse does not restrict the sampling region to lie within C — d 
to avoid overestimating the evidence. To prevent this, previous au- 
thors (such as Mukherjee et al. 2006) enlarged the ellipse by a fixed 
enlargement factor > 1, so that in most, but not all cases, the ellipse 
will encompass all active points at any stage. An obvious example 
where this may break down would be a rectangular posterior. We 
have chosen instead to define the ellipse to guarantee that at any 
stage all active points are enclosed. If we have determined the co- 
variance matrix C of samples at a given stage, the diagonal matrix 
C' is formed by transformation C' = X T CX where X is the matrix 
of eigenvectors of C, where each eigenvalue is the squared length 
of the ellipse along each principle axis. Thus the ellipsoidal bound 
is defined as 



■C _1 x = k, 



(9) 



where x are the parameter vectors along each axis and where the 
constant scaling length 




Figure 2. In Z as determined for a range of enlargement factors for a mul- 
tiple Gaussian posterior, the analytic value in 5.271. 



with Xi denoting the vectors of the active point set. This ensures 
that all active points are enclosed within the ellipse although there 
does, of course, still exist the possibility that some parts of the re- 
gion C > d lie outside the ellipse and so we still require an en- 
largement factor but one that is generally smaller than that required 
by Mukherjee's method. In Fig 2 evidence estimates are shown cal- 
culated with a range of enlargement factors 1.0 < en < 1.1 as de- 
termined for a relatively complex posterior of multiple Gaussians 
where the analytic ln.Z = 5.271. The algorithm succesfully con- 
verges to the correct value with an enlargement factor of ~ 1.06. 

Computationally it is convenient to draw the random sam- 
ples not from this ellipsoidal bound directly but from a unit sphere 
centred on the origin. Thus, we construct a transformation matrix 
T = fcX T D' where C' = D'D' which maps points from the unit 
sphere into the ellipse (centred at the origin) so a uniform deviate 
x drawn from the sphere forms a point within the ellipse y via 



y = Tx + c. 



(11) 



k = max[xfC x xi] 



(10) 



Generating deviates from a uniform n-sphere can be most eas- 
ily achieved by drawing a set of n Gaussian random numbers z (via 
the Box-Muller algorithm), scaling to give a point uniformly dis- 
tributed on the surface of the sphere z' = ^z where r = z t 
and finally scaling again by a deviate drawn from a quadratic dis- 
tribution between and 1 to generate a point uniformly within the 
sphere. It is worth pointing out that the number of active points TV 
must always be greater than the dimensionality of the parameter 
space in order for the ellipse to be defined unambiguously. 



3.2 Recursive Clustering 

The usefulness of ellipsoidal sampling is reduced for posteriors that 
are not uni-modal. Our extension allows the formation of multi- 
ple ellipsoids centred on individual isolated peaks in the posterior 
which can greatly reduce the region of prior sampled and thus in- 
creases sampling efficiency. The idea is depicted in Fig. 3 in a toy 
posterior and equivalently in Fig. 4 in X — C space, where upon 
identifying distinct clusters they are separated and fit by individual 
ellipsoids. While this does violate the requirement that C(X) be a 
uniquely valued function we are rescued by the linear nature of the 
evidence. It is valid to consider each cluster individually and sum 
the contributions provided we correctly assign the prior masses to 
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(c) (d) 




(e) 



Figure 3. Cartoon of a simple bimodal case. In (a) we see that the ellip- 
soid represents a good bound to the active region. In (b)-(d), as we nest in- 
ward we can see that the acceptance rate will rapidly decrease as the bound 
steadily worsens. Figure (e) illustrates the solution, we branch and sample 
each clustered region in turn. 




X 



Figure 4. View of clustering in C — X space. 



each distinct region. Since our collection of N active points is dis- 
tributed evenly across the prior we can safely assume that the num- 
ber of points within each clustered region is proportional to the 
prior mass contained therein. 

Identifying clusters in the posterior, in the absence of any ana- 
lytical form for the function, is performed by recursive application 
of the K-means algorithm with K — 2. This process is designed 
to place TV data points in an arbitrarily large number of dimensions 
into K clusters. An iterative process the algorithm starts by divid- 
ing the set of samples into two clusters, the mean position of each 
cluster is then determined and each subsequent point is assigned to 
one or other of the two mean positions. The next iteration updates 
the mean positions and so on. In this way a set of sampled data 
can be searched by the alorithm for clustering centres. For a more 
detailed discussion of the i\"-means process see for example see 
Mac Kay 2003. While our implementation does not require the user 



to know beforehand any details about the shape of the posterior, 
the algorithm does contain a number of user definable parameters 
which can guide the clustering. These parameters define two con- 
ditions that must be met once two separate clusters have been iden- 
tified via K-means. The first is a volume reduction factor set so that 
the total volume of the combined clustered ellipsoids is less than 
some fraction of the pre-clustered ellipsoid. The second parameter 
requires that the clusters are sufficiently separated by some distance 
to avoid overlapping regions. We have found reasonable values for 
the former to be ~ 0.3 while the latter seperation criteria requires 
a new ellipse to be at least 1.1 times the existing ellipse major axis 
in each dimension. If these conditions are met clustering will occur 
and the algorithm will search independently within each cluster for 
further sub-clusters. This process continues recursively until some 
stopping criterion (discussed in Sec. 3.3.) is met. So long as condi- 
tion (i) is met we will be guaranteed a performance improvement 
as less of the prior needs to be sampled at each cycle. 

The division of prior mass between two clusters introduces 
a further source of error, and we can no longer simply assess the 
uncertainty in Z. To resolve this we must find the probability of 
the mass fraction in each cluster. If we have a fraction /i in cluster 
1, with a total number of points N, the probability of n\ points in 
cluster 1 is simply binomial: 

We can invert this using Bayes' theorem with a uniform prior to 
yield P(/i|m, N) and draw samples via MCMC which can then 
be used, as in Sec. 3 to generate a variance for each cluster which 
can be summed to give the total error. 



3.3 Stopping Criterion 

We wish to stop the calculation on determining the evidence to 
some specified accuracy. One way would be to proceed until the ev- 
idence estimated at each replacement changes by less than a spec- 
ified tolerance. However, this could for example underestimate the 
evidence in cases where the posterior contains any narrow peaks 
close to its maximum. Skilling (2004) provides an adequate and ro- 
bust condition by determining an upper limit on the evidence that 
could be determined from the remaining set of current active points. 

By selecting the maximum likelihood £ max in the set of active 
points one can then safely assume that the largest evidence contri- 
bution that can be made by the remaining portion of the posterior 
is AZi = £ ma x^i i.e. the product of the remaining prior mass and 
maximum likelihood. We choose to stop when this quantity would 
no longer change the final evidence estimate by some user defined 
factor. 



4 APPLICATIONS 

To demonstrate the capabilities of clustered nested sampling we 
now take three specific examples; the first two involve posteriors 
of known functional form so that an analytic evidence can be com- 
pared with that found through our nested sampling algorithm and 
the final example demonstrates a cosmological application involv- 
ing a bi-modal posterior. 
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Figure 5. Toy Model I: Posterior consisting of 3 Gaussian peaks placed at 
the vertices of an equilateral triangle in the x-y plane at a distance of 0.5 
units from the origin with a = 0.1. The clusters found are depicted in 
colour at each peak. 



Figure 6. Toy Model II: Posterior consisting of 5 Gaussian peaks spaced 
randomly in the x-y plane of varying width and height. The clusters found 
are depicted in colour at each peak. 



Table 1. Summary of sampling statistics with and without clustering for the 
3-Gaussian posterior with a log-evidence, determined analytically of 2.813. 



Toy Model I: 


Clustering 


No Clustering 


In Z 


2.798 


2.830 


Mike 


12,115 


258,361 


Sampling Error 


0.063 


0.060 


Calculated Error 


0.061 


0.059 



Table 2. Summary of sampling statistics with and without clustering for 
the multi-Gaussian posterior with a log-evidence determined analytically 
of 5.271. 



Toy Model II: 


Clustering 


No Clustering 


In Z 


5.296 


5.230 


Mike 


1,016,994 


3,615,300 


Sampling Error 


0.083 


0.085 


Calculated Error 


0.084 


0.084 



4.1 Toy model I 

Our first example posterior consists of 3-Gaussian peaks in a two 
dimensional parameter space. The clustering (see Fig. 5) algorithm 
first divides the space into cluster 1 and cluster 2+3, runs to the 
required accuracy on cluster 1 then returns to the combination of 
clusters 2+3, immediately clusters to separate 2+3 then running 
both successively to completion. Table 1 shows that with and with- 
out clustering we can determine the evidence to within 2 % of the 
analytic value. To obtain similar accuracy, however the clustered 
example required only 5 % of the number of likelihood evaluations. 
Our calculated errors agree to within 10 % of the sampling errors 
based on 50 repetitions. 



4.2 Toy model II 

Posteriors that occur in typical applications may be far more com- 
plicated than that used in Sec. 4.1. Our second example uses a com- 
bination of five Gaussians of varying width and height in the x-y 
plane (see Fig. 6). Clustering occurs naturally in the expected fash- 
ion, producing a improvement in efficiency of at least a factor of 3 
(see Table 2). Note in particular the algorithm correctly isolates the 
slim Gaussian in cluster 5 superimposed on the larger Gaussian at 
cluster 4. 



4.3 Cosmological application 

Recent measurements of the Cosmic Microwave Background 
(CMB) by WMAP indicate a deficiency of power on large scales 



when compared with a flat ACDM cosmological model. One pos- 
sibility is that this decrease is caused by a feature in the primor- 
dial spectrum of density perturbations, specifically a sharp cutoff 
on scales k ^ 0.0003 Mpc -1 . Many analyses to date have exam- 
ined various forms for this cutoff (for example Efstathiou 2003; 
Bridle et al. 2003; Shafieloo & Souradeep 2003; Lasenby & Doran 
2005), here we will examine a model with favourable likelihood as 
determined by Sinha & Souradeep (2006) using WMAP 1-year TT 
data (Bennett et al. 2003). The posterior derived for this model/data 
combination contains substantial multi-modality in the parameters 
defining the spectrum. 

Physically a sharp cutoff can be generated by a kink in the 
inflationary potential (Starobinsky 1992), followed by a 'bump' and 
damped oscillations at larger k. This effect can be modelled via 
a transfer function applied to any underlying spectrum P(k) — 
P (k)T 2 (y, R*) where y = k/k, and 



T 2 {y,R,) = 1-3(E,-1) 



9 



1 \ 2 

sin 2y H — cos 2y 



1 + ^ + 
y 



1 \ 2 

— 1 cos 2y sin 2y 



(13) 



R, is the ratio of inflationary potential and scalar field where fc* 
is the cutoff wave-vector. The best fit model as found by Sinha & 
Souradeep (2006) was a slight variant on this with an exponential 
cutoff such that the underlying spectrum is a single index spectrum 
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Figure 7. Cosmological Application: Set of replacement samples in the cut- 
off scale k c and a with clustered regions shown. 



with exponential decay, governed by a factor a: 

P(k) = A(l-e (0 - r&y)a )k n - 1 T 2 



(14) 



We have restricted the analysis to the set of parameters defining the 
spectrum only (n s , k c , a, R*, A s ) and fixed the remaining parame- 
ters to their best fit values as determined via MCMC using the Cos- 
MOMC package (Lewis & Bridle 2002) (namely Q, b h 2 = 0.0243, 
Q c h 2 = 0.115, 6 = 1.04 and r = 0.22). The results of Starobin- 
sky (1992) suggest that the posterior is bimodal in the subspace 
(a, k c ), andso this problem provides an ideal example of the appli- 
cability of clustering. The algorithm correctly identifies both peaks 
and clusters accordingly (see Fig. 7). The marginalised posteriors 
for each parameter are shown in Fig. 8 

The maximum likelihood point in the marginalised posterior 
of fe c is clearly centred at 0.0003 Mpc -1 (see Fig. 8) with a sec- 
ondary peak also visible at approximately 0.0006 Mpc -1 . These 
features provide an ideal example of the applicability of clustering. 
The algorithm correctly identifies both peaks and clusters accord- 
ingly (see Fig. 7). 

We completed 10 separate evaluations of the evidence via the 
three most common methods for comparison. Table 3 summarises 
the evidences obtained, their associated errors in sampling (i.e. the 
standard error over the 10 runs), the calculated error estimated via 
the method described in Sec. 3.3 and the number of likelihood calls 
made per individual run. Both nested sampling methods provide 
significant improvements in Mike over thermodynamic integration 
1 . With clustering however we see a further performance increase 
by more than a factor of 2 for similar statistical uncertainties. More- 
over, the sampling error and our calculated error show extremely 
good agreement, eliminating the need to perform more than one 
evidence calculation and thus further reducing the computational 
cost by an order of magnitude. 




3.1 



3.2 



3.3 
A 



3.4 



3.5 



Figure 8. Cosmological Application: 1-D marginalised posteriors for model 
spectral parameters, comparing those obtained via MCMC (red) and nested 
sampling (black). 



Table 3. Sampling statistics for the Starobinsky model via nested sampling 
(both clustered and without) and thermodynamic integration (TI). 



Example III: 


Clustering 


No Clustering 


TI 


In Z 


494.65 


494.45 


493.76 


Mike 


96,720 


250,523 


> 400,000 


Sampling Error 


0.29 


0.31 


0.87 


Calculated Error 


0.30 


0.29 





simulated 'toy' models and a cosmological example based on the 
Starobinsky primordial power spectrum -in which a substantial 
performance increase was noted. Although we have examined here 
a multi-modal cosmological example, our sampler is by no means 
restricted to such use. In fact we feel an important feature is the 
ability to determine the variance of the final evidence value, for 
any model, without calculating the sample variance over repetitive 
runs. This alone reduces the computational load by at least an order 
when compared with previous methods. 



5 CONCLUSIONS 

We have demonstrated the applicability of our clustered nested 
sampling algorithm to multi-modal posterior distributions in both 

1 Note that although thermodynamic integration took only twice the num- 
ber of likelihood calls its associated sampling error is still significantly 
larger. 
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